Record: Chunk-Based N-gram Backoff + Score-First TTT (0.295 BPB)#809
Open
AayushBaniya2006 wants to merge 2 commits intoopenai:mainfrom
Open
Record: Chunk-Based N-gram Backoff + Score-First TTT (0.295 BPB)#809AayushBaniya2006 wants to merge 2 commits intoopenai:mainfrom
AayushBaniya2006 wants to merge 2 commits intoopenai:mainfrom
Conversation
Order-9 chunk-based N-gram eval cache with entropy-adaptive alpha and per-order multipliers, combined with score-first TTT (LoRA). Mean val_bpb 0.29519 across 3 seeds (std 0.00013). Architecture: 11L 512d GQA 8/4, MLP 3.0x, XSA-4, LeakyReLU(0.9)^2, BigramHash(4096), GPTQ int5. 13.4MB artifact, 525s training + 340s eval on 8xH100 SXM. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Match depth of PR openai#549 README: explain why techniques work, full N-gram cache walkthrough, entropy-adaptive alpha details, compliance section, timing budget with data access column, ablation with deltas, and proper credits to prior work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
HGNGNGNGHGNGNGN bro.... my brain |
XinghanLi66
added a commit
to XinghanLi66/parameter-golf
that referenced
this pull request
Mar 26, 2026
Today (2026-03-26) the leaderboard was transformed by eval-time n-gram backoff cache technique. Add comprehensive context for agents: - URGENT_ngram_backoff_breakthrough.md: full implementation guide with NgramEvalCache code, entropy-adaptive alpha, complementary training, priority order for implementation - latest_sota_snapshot.md: updated with new PR landscape - 3 reference code files from top PRs (openai#809 0.295, openai#803 0.442, openai#813 0.667) The n-gram backoff is purely eval-time — adding it to our existing best checkpoint should immediately jump from 1.119 to ~0.67 BPB. Implementing it is now the single highest-priority task. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
newjordan
pushed a commit
to newjordan/parameter-golf-1
that referenced
this pull request
Mar 26, 2026
Three variants targeting the 0.187 BPB gap to openai#1: - bwing_alpha: clip 0.95, alpha 0.05-0.60 (isolate alpha curve) - bwing_entropy_shift: per-order entropy center shift (isolate) - bwing_full_port: all openai#809 techniques + fixed order mults (fire first) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
newjordan
pushed a commit
to newjordan/parameter-golf-1
that referenced
this pull request
Mar 26, 2026
- Cubric 3D back online (CADENCE=32, warm-start) - Per-order entropy center shift from openai#809 - Alpha 0.05-0.60, clip 0.95 - Our sliding-window TTT spliced in (1 epoch, SGD, freeze 2 blocks) - TTT runs BEFORE n-gram eval → adapted model feeds n-gram Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
newjordan
pushed a commit
to newjordan/parameter-golf-1
that referenced
this pull request
Mar 26, 2026
- Port openai#809 LoRA TTT: rank-8 adapters on Q/V/LM head, AdamW, Polyak - Add LoRA injection to CausalSelfAttention, Block, GPT forward paths - 53s vs our old 410s TTT, 6x better BPB gain - Cubric 3D ON + entropy shift + alpha 0.05-0.60 clip 0.95 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Programmerryoki
added a commit
to Programmerryoki/parameter-golf
that referenced
this pull request
Mar 26, 2026
Implements the breakthrough eval-time technique from PR openai#809 (0.295 BPB): - BackoffNgramMixer: order-2 to order-9 N-gram cache - Entropy-adaptive alpha blending (model + N-gram predictions) - Sequential eval building cache from scored tokens (legal/backward-looking) - Configurable via NGRAM_EVAL=1 and NGRAM_MAX_ORDER=9 env vars - GPT.forward() now supports _return_logits mode for N-gram blending Enable with: export NGRAM_EVAL=1 NGRAM_MAX_ORDER=9
Robby955
added a commit
to Robby955/parameter-golf
that referenced
this pull request
Mar 26, 2026
Add complementary training (from @pentxayc openai#803) and per-order multipliers (from @AayushBaniya2006 openai#809) on top of distributed prefill + 15-gram + order-adaptive gating. New 3-seed results: 0.28798 / 0.28804 / 0.28810 All seeds under 16MB, training under 560s, eval under 330s. Updated README with legality hedge, full ablation, credits.
XinghanLi66
added a commit
to XinghanLi66/parameter-golf
that referenced
this pull request
Mar 26, 2026
…k trivial proposals - research_memory.md: add PARADIGM SHIFT header, correct the eval_011 conclusion (failed due to naive/slow implementation, not because n-gram doesn't work), add OVERRIDING note in Open Hypotheses directing agents to PR openai#809 code - codex_research_prompt.txt: add explicit ban on trivial proposals (random seed, minor hyperparams) in aggressive phase; add eval_011 correction note so agents use the correct vectorized chunk-based n-gram approach Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
XinghanLi66
added a commit
to XinghanLi66/parameter-golf
that referenced
this pull request
Mar 26, 2026
The Negative Results section said 'do not retry n-gram/lambda sweeps' and 'eval_011 does not justify cross-seed confirmation'. These entries would block agents from implementing the correct PR openai#809 vectorized n-gram cache. Replace with correct framing: eval_011's naive per-segment implementation was the problem (1901s, 3× over budget), not the concept. The correct vectorized chunk-based approach achieves 0.2952 BPB in 287s. Also supersede the 'next single-variable refinement' hypothesis entry which assumed refinement phase; we are now in aggressive phase (gap=0.827). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
XinghanLi66
added a commit
to XinghanLi66/parameter-golf
that referenced
this pull request
Mar 26, 2026
…(legality review) - SOTA target is now PR openai#803: Complementary Training + Backoff N-gram + TTT - PR openai#809 (0.2952) excluded pending legality review - research_memory.md: fix Working SOTA Anchor section (agent had written it to explicitly ignore the URGENT file and stick to 1.1194 — removed that) - All PR openai#809 references updated to PR openai#803/openai#813 - Dashboard: SOTA now 0.4416, gap 0.681 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
quietsmile
added a commit
to quietsmile/parameter-golf
that referenced
this pull request
Mar 26, 2026
Extended eval-time n-gram backoff from order 9 to order 12, reduced chunk size from 1M to 256K tokens for faster cache refresh, and increased alpha_max from 0.60 to 0.70. Two-seed validation: 0.2835 (seed=1337), 0.2833 (seed=42). Improvement over PR openai#809 baseline: -0.0118 BPB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 tasks
4 tasks
7 tasks
6 tasks
newjordan
pushed a commit
to newjordan/parameter-golf-1
that referenced
this pull request
Mar 26, 2026
openai#809 uses INT5 — more aggressive quantization creates more entropy in the post-quant model, letting n-gram eval rescue harder. Their quant loss is 0.019 vs our 0.006 (INT6), but n-gram extracts 0.869 vs 0.668. Changes from bwing_IV: - clip_range: 31 → 15 in gptq_quantize_weight, quantize_int6_per_row, and _find_best_row_scales - No cubric (it hurt in bwing_V) - 9 hash primes (from bwing_IV) - All openai#809 n-gram params (fixed mults, entropy shift, alpha curve) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
newjordan
pushed a commit
to newjordan/parameter-golf-1
that referenced
this pull request
Mar 26, 2026
openai#809 trains for 525s, leaving 75s for GPTQ. We were using the full 600s default. 570s leaves 30s for GPTQ calibrate (3.4s) + quantize (~25s) with headroom. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
newjordan
pushed a commit
to newjordan/parameter-golf-1
that referenced
this pull request
Mar 26, 2026
Green_1 scored 0.3200 BPB with oracle alpha alone. Green_2 adds LoRA TTT to close the remaining 0.025 gap to openai#809 (0.2952). TTT flow (score-first legal): 1. Sliding window eval scores all val tokens (frozen model) 2. LoRA rank-8 adapters injected on Q, V projections 3. Single pass over val tokens: score then adapt (AdamW, lr=3e-4) 4. Polyak averaging (decay=0.998) for stability 5. N-gram eval with oracle alpha on adapted model Coarse stride (16x) keeps TTT under 60s. Total eval budget: ~290s. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
RoyiRa
added a commit
to RoyiRa/parameter-golf
that referenced
this pull request
Mar 26, 2026
order=9, alpha=0.95, temp=0.85, prune=4%, NGRAM_ORDER_ADAPTIVE=1 Per-order entropy thresholds + multipliers: 0.5862 -> 0.2071 (-0.379!) Beats PR openai#809 (0.295 BPB) which was the competition leader. V62 (phrase cache stacked on top) now running. Progression this session: V27 start: 1.0541 V28 n-gram cache: 0.9897 V31 alpha+order tuning: 0.8802 V45 temp sharpening: 0.7775 V59 full-chunk sharing: 0.5865 V61 order-adaptive: 0.2071 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
|
it's cool to see this already branching into new directions - I'm an undergrad at the University of Texas at Austin that funded this out of pocket for $200, if there are any compute credits available for contributors I'd love to keep pushing it further! |
RichiiiTV
pushed a commit
to RichiiiTV/parameter-golf
that referenced
this pull request
Mar 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Approach
Eval-time order-9 N-gram backoff cache is the primary technique. The cache is built incrementally from already-scored validation tokens (score-first, legal per competition rules). Processing in 1M-token sequential chunks with all GPU ranks sharing cache state ensures maximum cache utilization.
Key innovations:
Also includes score-first TTT (LoRA rank 8, AdamW) contributing ~0.015 BPB.
3-Seed Results
Timing
Architecture
11L 512d GQA 8/4, MLP 3.0x, XSA-4, LeakyReLU(0.9)^2, BigramHash(4096), GPTQ int5, 27.3M params.
Compliance